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Abstract 

A new approach is introduced to estimating object surfaces in three-dimensional space from a 
sequence of images^ A surface of interest here is modeled as a 3-D function known up to the values of , — /:(£■" 

a few parameters. The approach will work with any parameterization. However, in work to date we 
have modeled objects as patches of spheres, cylinders, and planes.—primitive objects. These primitive 
surfaces are special cases of 3-D quadric surfaces. Primitive surface estimation is treated as the general 
problem of maximum likelihood parameter estimation based on two or more functionally related data 
sets. In eu^oe, these data sets constitute a sequence of images taken at different locations and orien- 
tations. A simple geometric explanation is given for the estimation algorithm. Though various tech- 
niques can be used to implement this nonlinear estimation, we discuss the use of gradient descent. 

Experiments are run anJ discussed for the case of a sphere of unknown location. These experiments 
graphically illustrate the various advantages of using/as^many images as possible in the estimation and 
of distributing camera positions from first to last over as large a baseline as possible. In order to 
extract all the usable information from the sequence of images, all the images should be available 
simultaneously fcr the parameter estimation. We introduce the use of asymptotic Bayesian approxima- 
tions in order to summarize the useful information in a sequence of images, thereby drastically reducing 
both the storage and amount of processing required,. The attractiveness of ouf Bayesian approach is that 
now all the usual tools of statistical signal analysis can be brought to Iredr, the information extraction 
appears to be robust and computationally reasonable, the concepts are geometric and simple, and essen- 
tially optional accuracy should result. / 


I. Introduction 

Essentially all 3-D object surface estimation from multiple views to date is based on either active stereo using a 
laser and one or two cameras for triangulation, or on passive stereo involving matching points in two images and using 
triangulation, or on optical flow [Ij, (10), [11]. We suggest a new approach in which surfaces of complex objects are 
approximated by a few patches of 3-D parameterized surfaces, and these parameters are estimated from two or more 
images taken by calibrated cameras from different locations and directions. These parameterized patches are referred to 
as primitive objects. We formulate the parameter estimation problem as standard maximum likelihood estimation, given 
two or more functionally related data sets. Estimation accuracy is achieved by processing data in blocks (which may be 
large), in addition to processing man. .nages and with camera positions distributed over as large a baseline as possible. 
The actual processing is simple standard statistical signal analysis. This approach, first presented in [4J, is completely 
new as far as we know. In summary, thie contribution of this paper is the treatment of 3-D surface inference as a stan- 
dard maximum likelihood parameter estimation problem requiring low data storage capacity and where parameter esti- 
mates are updated recursively as each new image in a sequence of images is received and processed. 

Central to 3-D surface estimation from two (or more) images taken from cameras in different locations and orienta- 
tions is the pairing of points from two images that are images of the same point on a 3-D surface. This matching of 
points in two images is usually done :n either of two ways, (i) If the two cameras are physically close and their optical 
axes are almost parallel, then their images will differ from one another only by translation— one will be a shifted version 
of the other. Then image 1 can be partitioned into patches, and each patch cross-correlated with image 2 to find its loca- 
tion in image 2. Once this correspondence is known, the location of the surface region in 3-D space seen in the pair of 
corresponding image patches can be determined by triangulation. Since the surface region seen is usually curved, one 
would like the patches to be small in order to locate the surface region seen accurately. However, if the images are 
noisy, large surface patches must be used to accurately estimate a pair of corresponding patches in the two images. 


71 



Significant triangulation errors occur when the camera optical axes are close together and almost parallel because of 
matching errors due to image noise, and because 3-D object surfaces are curved. Additional tri angulation error occurs 
because there is some error in camera calibration, (ii) An alternative approach that permits a large angle between the 
camera optical axes to improve triangulation accuracy is to locate corresponding small local features in the two images. 
An example of such a feature is a vertex of a polyhedron. For a curved surface, contours on the surface are features 
often used to be matched in pairs of images. The difficulty here is that a large amount of pattern recognition may be 
necessary to recognize a pair of corresponding features in the two images. Past efforts at cross-correlation of large image 
patches, as in (i), has been unsuccessful here because a patch in one image will be a distorted version of a corresponding 
patch in the other image. 

The work closest in spirit to ours is the recent work of Faugeras, Ayache and Favtijon [8], who develop the idea 
of estimating points and lines on a 3-D object surface, or planar surfaces, from a sequence of images. More specifically, 
they assume that the probability distribution for the estimates of points on a surface based on a pair of images is known. 
They then assume that a sequence of such estimates and associated distributions are known for a sequence of images. 
Their contribution, then, is to use the extended Kalman filter for combining this sequence of estimates to obtain improved 
estimates of the surface points. They derive the equations for estimating lines, and suggest that it can be extended to 
planes. Among the errors 4hey take into account, are those in camera calibration. Their concept is important, though 
they do not tackle here the problem of optimally estimating the surface points or lines directly from the data in the 
images. 

Our paper is an expansion of one where our 3-D surface estimation algorithm was first proposed [4]. In subsequent 
papers, we showed that our basic estimation algorithm is maximum likelihood estimation, and derived Cramer-Rao 
irreducible lower bounds on the parameter estimation error covariance matrices [6), and we also discussed the use of 
Markov Random Field (stochastic process) models for 3-D surfaces [3] as a generalization of the use of parameterized 
surface models. These and the present paper together constitute a new Bayesian theory for 3-D surface estimation based 
on a sequence of noisy images. 

Sections ll.A - U.C introduce the transformations necessary for understanding the relation of images in two or more 
views. Sections III. A • 11I.B describe the performance functional and the gradient descent algorithm used in estimating 
the a priori unknown 3-D object parameters based on the use of two images. Section I1I.C provides a very simple 
geometric interpretation of the algorithm. Sections 1V.A - 1V.D extend the approach for use of a sequence of images that 
might be taken by a moving camera. In order to arrive at a computationally feasible algorithm, we introduce the use of 
maximum likelihood estimation here. This development also points out that the algorithm described in section III is 
maximum likelihood estimation. The importance of this observations is that maximum likelihood estimators are known 
to converge to the true parameter values, and are known to have minimum estimation error covariance as the number of 
observations become large. In section V we introduce a somewhat different estimator for a moving camera, and point 
out that it hes certain desirable computational properties but is less accurate. This algorithm is somewhat similar to the 
use of optical flow . 


1I.A Notation and Description of Camera Motion 

Let P be a point in 3-D space and r = (x y z) Tf be its coordinates in the fixed orthogonal world reference frame. 
Since we assume that objects do not move, this reference frame is fixed with respect to the objects viewed by the cam- 
era, and we will call it the object reference frame (ORF). Let r(n) = (x a y a zJ T be the coordinates of the point P in 
CRFn, the reference frame attached to camera n. This reference frame is such that: (1) the camera optical axis is parallel 
to the z a axis, and it looks at the negative z a axis; (2) the x a and y a axes are parallel to the sides of the image; (3) the 
origin of the reference frame coincides with the center of the image plane. The image is corrected so that the view is 
rot inverted top to bottom and left to right, i.e., a central projection is used. 

Let B(n) denote the 3x3 orthogonal rotation matrix that specifies the three unit coordinate vectors for CRFn in 
terms of the three unit coordinate vectors for the ORF. Let r c (n) specify the origin of CRFn in the ORF. Then 

r{n) = B T (n) [r - r c (n)j. and r = B(n)r(n) + r c (n). (1) 

The rotation matrix B(n) and the translation vector r c (n) are known for calibrated cameras. In this paper, we will use b a 
to represent a vector having as its components the parameters that specify both B(n) and r c (n). 


t A symbol is boldface U a column vector, a superscript capital T attached to a vector denotes vector transpose. 


1LB SvfactParaMleriiatlM 

Our approach is applicable to any parameterized surface. A few researchers have used differential geometric pro- 
perties, such as Gaussian curvature and mean curvature, to describe surfaces, see (2). These are useful for surface 
parameterization because they *e coordinate free, in general, the surfaces we want to estimate can be described by an 
inqriicit function with respect to the ORF: 

8(r;a) = g(x,y,z;a)*0, (2) 

where a is the parameters describing the surface with respect to the ORF. For example, the equation for the general qua- 
dratic surface is 

a M x 2 + 2a u xy + 2a 13 xz + a^y 2 + 2ajjyz + a M z 2 + 2a M x + 2aj4y + 2 a 34 Z + *u * 0 • 0) 

In this case we denote a » (an, * u , •••♦ *4«) T - 


1LC Images of an Object Surface Point in Two Image Frames 

As shown in Fig. l.a, P denotes a point on a parameterized 3*D surface of interest This surface is described by a 
function in the ORF (see section II.B). The function is uniquely determined by specifying the values of a parameter vec- 
tor a. Point P on the object surface is seen as points having coordinates s and u in images 1 and 2, respectively. We 
assume a Lambertian reflectance model. Then the images of point P at s and u will have the same intensity. The tech- 
niques proposed will not apply to specular reflectors, without modification, because the location of points on the object 
surface at which specular reflection occurs depends on the camera location. Since most surfaces of interest are largely 
Lambertian, the assumption is a useful one. Hence, 

Mu) * Ms) < 4 > 

where Ij(u) and Ij(s) are the picture functions (image intensity functions) in Frames 1 and 2, respectively. For those 
cases where the Lambertian assumption does not apply, a possible modified approach is to use an edge map. Here, pix- 
els are given values of 128 or 0 depending on whether they are detected as being edge points or non edge points, respec- 
tively. These maps arc then smoothed to obtain more continuous arrays, and these are used as though they are regular 
picture functions in our estimation algorithms. The usefulness of the edge map is that it is a representation of rapid 
changes in the object surface patterns, and largely unaffected by the presence of some specular component in the object 
surface. Experiments using edge maps with our algorithm are described in [6J. 

For simplicity, we use the orthographic projection model 17] for image formation, i.e., alt rays from points on the 
object surface to the camera are roughly parallel. (With slight modification, all of our results can be used with the per- 
spective projection model.) Let r(l) = (x, yi z,) T be the coordinates of the 3-D surface point P with respect to CRF1, 
and r(2) = (X 2 y* Zj) T be the coordinates of the point P with respect to CRF2. Then, under the assumption of ortho- 
graphic projection, 

s = (*i yt) T . «* = (*2 yi) T - 

If we pick a point s in image plane I, it will correspond to some point P on the 3-D surface. If this point P is also seen 
in image 2, its image in image plane 2 will occur at some coordinate u. Therefore, given some point s in image 1, if we 
want to compute the corresponding image point u in image 2 based on the current estimation of a, we can: 

(i) lint, find the 3-D location of the corresponding surface point P; 

(ii) then, find the image point u corresponding to P. 

In step (i), represent the surface point P with respect to CRF1 by r(l) = (x ( yi Zt) T . Using equations (1) and (2), 
the equation of the surface is 

g(r, a) = g( B(l)r(l) + r c (l) ; a) = g( B(l)(xj y, z,) T +r c (l) ; a) = 0. (5) 

Since the point P resides on the surface, r(l) must satisfy the above equation. Therefore, given s = (x, y,) T , we can 
solve equation (3) for z t . An example far the spherical surface is given in the next section. 

In step (ii), we want to compute u. Now that we have obtained r(I) from step (i), using equation (1) we can com- 
pute u = (xj yj) T by 
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rt2) - l*i yj rj) T - B t ( 2) [r - r c (l)] - B T (2)B(l)r(l) + B T (2) [r«(l)-r^)]. 
Call C ■ B t ( 2)B(1) and d * B T (2) (r c (l) - r e (2». Then, partition the C matrix and d vector as: 

[C|| 


[Cat C22 

when Cji is 2x2, c t2 and c J are 2x1, cn is a number, e is 2x1, and d] is a number. From the preceding: 

u » C u a + C| 2 *i(s,bi,a) + e. 

Combining steps (i) and (ii) above, we denote the functional relationship (6a) between a and u by 

u = h(s, b. z(*,a) ), 

Inhere the vector b includes bj and b* and specifies Cu, c^, and e. 


( 6 ) 


(6a) 

(7) 


UI> Estimation of the Parameterized Surface Using Two Images 

If we know the camera position, b, and the true surface parameters, *r» then 

I|(s)~I 2 (h(s,b,z(s* r ))) 

for each s. Choose a region in image 1. Denote this pixel set in this region by D. Consider the error measure 

«o(«) = E [l 1 (s)-I 2 (h(s,b,z(s,a)))] 2 . 


Then « D (a) is a minimum at a = aj>. Our problem is to estimate aj by minimizing (9) with respect to a. 
To estimate *r that minimizes (9), we choose to use the gradient method as follows: 

de 0 (a b) 


* 0+1 = *» “ 


da 


•A., 


3r D (a # ) 

where A, depends on ^(aj and ■ and has magnitude that goes to 0 as n goes to infinity. 


da 


dry 


( 8 ) 

(9) 

( 10 ) 


There are several ways to compute the gradient — r-“. We present one of the methods used in our experiments. 

ait 


Taking the derivative of (9) with respect to a, we have 

_ r. . ,1 31*00 

"aT = ^ L 1,(s) “ Il(u> J - aS - * 

where u is a function of a as shown in (7). Use of the chain rule gives* 

^ 2 (») _ dl 2 (u) du dz(s» a) 
da du dz da 
dl 2 (u) 

where ti => h(s, b, z(s,a) ) as in equation (7). The first term — - — can be computed approximately using the sobel 

du du 

operator. The second term — is just a constant provided that we assume the orthographic projection model This can 

dz 

be shown as follows. From equation (1) 


(ID 


( 12 ) 


r(2) = B t ( 2) [r - r c (2)]. 


( 1 ) 


and upon using the notation 


r(2) = (x 2 y 2 zj) T , u = (x 2 y 2 ) T , r = (x y z) T , 


* Tht notation used her* i> that 


dl 2 (u) 

da 


iuK component row vector, where K is the number of the component! in column vector a. 


7U 


we have • ■ (B T (2), 3 B t (2)b) t , where B T (2)» means the ij* element of matrix B T (2). 

02 

In general, it may be inconvenient to express z as an explicit function of a. Hence, we compute the third term by 

&(». «) <?g(x,y,z > a) / dg(x,y,z40 m 

da da dz ' 


BULB An Example: The Sphere 

To illustrate the approach, consider a spherical surface described by the equation 

(x - x<j) 2 + (y - yo) J + (z-zo) 2 =*R a . (14) 

For this surface, z can be solved for explicitly, via 

x-z 0 ±(R , -(x-J.) J -(y-y.) J ) ,n . (15) 

The positive square root is used since the outside surface of the sphere is seen by the camera looking in the negative z 
direction. Hence, 

. dz(s, a) . dz dz dz dz . 

1 da , = dy D dz„ dR ' 6) 

*( (x - xj/(z - zj, (y - yj/(z - zj, 1, R/Cz-zJ ) 

and z — z 0 *■ (R 2 — (x — xj 1 - (y - y 0 ) 2 ) l/2 . The vector can be computed directly from this. 

da 

The analogous equations for planes, cylinder and general quadrics are presented in [6]. 

UI.C Algorithm Operation Interpretation 

Fig. lb is useful for illustrating, in two dimensions, the operation of our algorithm for estimating a<. Spheres in 3- 
D are shown as circles. Consider the processing of the image patch between points s' and s" in Frame 1. This patch is 
the image of the patch between points p' and p" on the true sphere labeled at. The same patch on the sphere surface 
gives rise to the image patch between points u' and u" in Frame 2. Now suppose the system’s estimation of a ( is ft. The 
associated sphere is shown. The performance functional for the estimate of a is given by (9) and is computed as follows. 
The system thinks that the locations on the sphere surface that give rise to the images at points s' and s" in Frame 1 are 
the intersections of the dashed lines, from s' and s", with the sphere labeled ft. These sphere surface points would be 
seen as the images at point b' and b" in Frame 2. Hence, the system takes the image patch between points b' and b" in 
Frame 2 and assumes that the image at each point u in this interval is the same image as the image at a point s in the 
interval between s' and s" in Frame 1. The points u and s are related geometrically as in the figure, or algebraically by 
(4). Performance functional (9) requires computing this error Ii(s) - 1 2 ( h(s, b, z(s,a)) ). 

We make the following interesting observations. From the geometry of image formation in Fig. lb, the varying 
scale change that maps the image patch over interval [s', s"J in Frame 1 into the image patch over interval [u\ u"] is 
seen. Note that both a scale change and a translation are involved in this 2-D illustration. 

If the incorrect a is used in computing the performance functional (9), the patch of image used in Frame 2 is that 
over the interval [b\ b"]. Note that this interval is both a shift and a varying scaling of the interval [u' f u"J. If instead 
of a sphere, we were dealing with 3 planar surface, the scale change would be constant throughout the image. 


IV Estimation of Parametrized Surface Based on a Sequence of Images 

Now suppose a sequence of images is available for estimating %, the true parameters of the surface. How can 
best use be made of this data set? In this section we develop a computationally reasonable approximately maximum 
likelihood estimator (mie) for aj-. 

IV Jk The Model 

The model that we use for the n m image I B (u), veD v is that of some true picture function p*(u) plus additive 
white noise having variance o 2 . Hence, I,(u), ueD a , is a set of random variables having joint probability density 
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function (pdf) 


(2iic 2 )"^exp ]£ |-(l/2o 2 ) [l tt (u) - M,(u)j |> 


(17) 


where d. is the number of pixels in D m . We introduce the more compact notation: ^s, a) - and 

« (u o) ■ h _1 (u,b„a), where b a is the transfortnadon parameters specifying the n* camera position. Let I» denote the 
v«tor of picture function values. U, it has components I.(u) ueD.. Let M. denote the vector having component 
M.(u), ueD,. Then (17) is a function of the parameter vector (||J o 1 a T ) T - Because of the Lambertian assumption for 

image formation, 

H«(u) = }ii(s n (u,a)). (18) 

Hence, the u. for all r can be specified in terms of ft. Then a = (|i? o 2 « T ) T is a parameter vector that spedfi« the 
pdfs (17), for all I,. Since the additive image noise is independent from image to image, the log of the joint pdf o 

It.. ...I n i* 


L N (a) = In p(Ii, Ij» • • • » In I ®0 


- (in 2nd 1 ] - (l/2o a )2| S [w«) “ ft(Sa(u,»))] 2 j- 


(19) 


Our goal is to find fl N that maximizes (19). Since this estimate is a maximum likelihood estimate (mle), we know that it 
has certain desirable properties such as converging to % as ^ havin & minimum covariance matrix for the 

B»1 

error in the estimation of », as £d, becomes large. The difficulty here is that ft is a priori unknown. Hence, in order 

to compute fl N we must simultaneously compute Ain. *e estimate of Hi based on Isunl,...,I N . ft Th ? U8 i*j!? 1S lool “ a 
formidable computational challenge, it is in fact easily manageable. In [6] we showed that (9), the performance func- 
tional we minimize for estimating » r in the two picture case, is equivalent to (19) for N-2. 

IV.B The Asymptotic Representation 

As in section I1I.A, gradient methods can be used for minimizing -L N (a). A problem here is that N images must 
be stored and processed simultaneously. This incurs both a great amount of storage and a large amount of prowssing for 
each N. An effective approximation for avoiding this storage problem can be had as follows. Let denote 
I,,I 2 I N . In [3] it is shown that 


p(I N |a) = p(I N |4N)exp^ 


-(l/2)(a - An) T ¥(In. ft nX a ~ & n) 


-} 


( 20 ) 


(21) 


where the function ¥(]>,, A*) is a KxK. matrix having ijth element 

and K is the number of components in a. Hence, (20) has a Gaussian shape in a with mean a N and inverse covariance 
matrix 4^( Ijm, A N ). _ _ 

Now suppose we wish to compute A^+i. We can write pUn+iIo) = p(I*la)p(lN + ila), so that upon using (20), 
there results 


Ln+^ci) ~ 


Z ln P< I »l ft N) 


f**\ 


- -j(a - A n ) t T(]^, ftN)(a - An) + lnp(I N+1 1 a) 


( 22 ) 


The aoDeal of (22) is that all the useful information in I* is summarized in the quadric form, i.e., the second term on the 
Heh^hand side of (22L Notice that only the two rightist terms in (22) are functions of o. Now <Wt «« * f°“" d 
appro lately as the « that maximizes (22). Gradient descent can be used on (22). The gradtent here ts s.mply 
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~dLw»i( a ) 

da 


= H'(l N ,ft N Ma- - rjjtapflNMla) . 


(23) 


There is considerable computation here, since there are M 2 components for ii t in an MxM pixel patch, and a would 
therefore have M J + K + 1 components. A simplification is possible upon realizing that since the dependence_of (22) on 
|if is as a sum of two quadrics in |i{, a simple explicit value can be found for fti(N+i) ^ terms of An, In+i» ^(1n» ^n)» ^ '» 
and a. The resulting function to minimize is then a function of only o 2 and a, hence, only K+l parameters. Gradient 
descent can be used for this purpose. This solution is explored in [9]. Though this should provide the most accurate 
estimate for Or, for a number of reasons we have minimized a simpler function. 


IV.C Approximate Likelihood Maximization 

In this section, we treat Ij(s), seD, as if it were m(s). Then iL t is no longer treated as unknown -- only o 2 and a 
are unknown. If our goal is to estimate a only, then we do not have to estimate o 2 since o 2 gives information only about 
the accuracy of the estimate for a (see [6]) but does not affect the value of the estimate for a (see 19]). Hence, a = a. 
For practical reasons, instead of letting D n be an arbitrary subset in image plane n, we proceed analogously to the two 
image problems in Sec. HI. A.. Hence, (17) becomes 

p(I B | a) = (2jto 2 )" d,/2 exp| £ [i n (u a (s,a)) - I t (s) j j. (24) 

Then, 


4*, (a) = lnp(!jj| ApfcO 2 ) - - a N ) T T(]N, flN.^Xa - a N ) 

+ lnp(I N +ila.° 2 )- 


(25) 


Now, our goal is to compute &n+i> the value of a that minimizes the negative of (25). We suggest a gradient descent 
algorithm similar to that used in Sec. I1I.A. Let a N+u denote the estimate for aj after the k m iteration in the N+l* stage 
(i.e., the N+l tl stage is that following the input of the N+l* image and prior to the input of the N+2 nd image). Then we 
compute &N+! as the limit of in (26). 


dL N+l (a) j 

Sn+ix+i = &N+i,k + SCALE’ t 

da !■«**, 


u 


From (24) and (25), 


T~" ~ ^(In^n^X® - &n) + o -2 £ [(N+i( u N+i( s > a )) ~ lt(s)] 

da mO, L J 

Once An+i is computed. '?(I^,il N ,a 2 ) can be updated to 'F(I^ + i,fiN+i,o 2 ) by 


dI N ^,(UN>l(M)) 

da 


'FdN+l^N+t* 02 ) = '{'(IhjAn*^) - 


with -- —- In p(I N>1 1 a.q 2 ) a matrix having ijth element 
dada 




-D. 


lN+l( u N+l(S*SN+!))“)t( s ) ] 


d^N+XUN+lfo^N+l)) 

dajdai 


^lnp(I N .,|.V)l 


dlN+i(UN + i(s4N+l)) dlN-*-l(UN+l(S.3N^i)) 


daj 


da* 


For brevity, denote T( Ifj, fi N , o 2 ) by 4V Then the incremental stereo algorithm is summarized as follows: 


(26) 


(27) 


(28) 


(29) 
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1. Read image 1. 

2. Set = 0. 

3. For N 2 I. 

4. Read image N+l. 

5. Compute fi N by using Eq. 26 iteratively until it converges. 

6. Compute NFn+i by Eq. 28. 


IV.D Experiments With the Algorithm in Section 1V.C 

Figure 2 shows a sequence of nine computer generated images of a sphere. The images were generated by taking 
a few images of faces with a solid state T.V. camera, and using the computer to project these images onto a sphere. 
Using the pattern on the sphere generated in this way, the computer was then used to generate the images that should be 
seen by a camera at nine locations and with a specified CRF at each location. For this experiment the camera moved 
along a circular arc of radius 2000 units lying in a horizontal plane. The camera optical axis pointed to the center of this 
arc, and there was no rotation of the image plane about the optical axis. The angles between the camera optical axes in 
successive images were 5°. The patch of subimage used in each of the nine images is the region about the left eye ot 
the rightmost face in the image. The patch of subimage is outlined as roughly a small square in white. The parameters 
specifying the sphere are (xq, yo, z 0 ) T , the sphere center, and R, the sphere radius. In the experiments run, the sphere 
radius was assumed to be known and only the center was estimated. Table 1 shows the values of Hn found. The initial 
guess used for the sphere center was in error by a little more than the sphere radius of 128 units. The final estimate is in 
error by roughly two units. The optical axis of the camera moved through an angle of 40° from its Grst to its last posi- 
tion. These images were noiseless. However, some error is introduced because images are spatially quantized into pix- 
els. Table 2 shows the estimates for * more noisy image sequence. Each image here is the image in the correspond- 
ing position in Fig. 2 plus white Gaussian noise. The added noise has standard deviation of 5 units (i.e., variance of 25 
units) The initial estimate used here is also in error by about the sphere radius. The final error, based on nine 
images, is a little bit more than that in Table 1, but it is small. The accuracy of the algorithm appears to be remarkably 
good considering the small patches of data used in the estimation. In practice, image 1 would be partitioned into many 
squares, and a sequence of estimates would be obtained for each. The information obtained from each patch would be 
optimally combined using the methods presented in [3], thereby greatly improving the accuracy of the estimate ot a-r. 
With the initial error used here, the algorithm in (26) went through about 8-10 iterations to compute at eac ^ stage. 

Figure 3 contains plots of e D (a), equation (30), as functions of x 0 and yo, with z 0 held fixed at its true value, -2000. 

e D (a) = Z Z ~ Hi(s a (u.a))|\ (30) 

a=l u«O a ^ 

The purpose of these plots is to show how e D (a), which is the function that must be minimized to maximize (19), nar- 
rows in the vicinity of its minimum as the number of images used increases. Since the height of eo(a), i.e., the distance 
between its minimum and maximum is an increasing function of the number of images used, we have only plotted the 
functions in the vicinity of their minima. That is, the plots stop at a height of roughly 3000 units above the minima. 
The functions shown are based on the use of 2, 6, and 10 images, respectively. It is seen that the funcUons narrow 
appreciably in going from the use of two images to the use of 10 images. In Fig. 4, curves oi e u (a) are again shown , 
but only two images ate used in each case. However,, the angle between the pair of camera optical axes varies, with 
angles of 1°, 5°, and 45° for the three plots shown. Notice how broad and flat the bottom of the curve associated with 1 
is whereas the curve associated with 45° is much narrower, as expected. However, it is still not as narrow as the curve 
in Fig. 3c where the angle between the optical axes of the first and tenth cameras is 40°. Hence, both the range o 
angles spanned and the number of images used is important. The other observation of interest is that the functions ‘ n 
Fig 3 and those in Figs. 4a and 4b are smooth, whereas that in Fig. 4c is not The multimodal benavior of Fig. 4c is 
due to the high frequencies in the pattern on the sphere surface. The effect is moderated when the angle between the 
optical axes of a pair of images is small, and the effect is also suppressed by the averaging that takes place when many 
more than two images are used. 
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V Incremental Stereo 

A slightly different formulation is to write the joint likelihood for the image differences I2 - It, 
I4 - 1) , . . . , I N - 1^!. The joint likelihood can be written as 

£ [l2a(«2n(U.a)) - (31) 

Here, u 2a (u,a) in l2a(U2 D (u,a)) denotes the point in D 2a that the point u in D 2 b-i maps to. The mean value functions 
litfSataa)) do not appear here since the expectation of I 2a (u 2a (u,»r)) ” ^2a-t( u ) f° r each u is 0. Also, the variance of this 
difference for each u is 2 o*. Then a^ is to be chosen to minimize 

-lj + 2(a) * f — — -ln(4no 2 ) + £ T? T> I 2o(U2a(u,a)) - I 2n -i(u) . (32) 

a-l 1 0-1 4 <T 1 J 

Again, it is computationally undesirable to store the N+2 images and also to process all of them simultaneously in order 
to compute a^* Hence, as in Sec. 1V.C., we use an asymptotic approximation, Oauisian in a, to represent (31) when 
computing 8|sN2- 

Table 3 contains the estimates a« based on a sequence of images including those in Fig. 2. Note that the angle 
between the optical axes for the flrst and last camera positions used for the images in Fig. 2 is 40°. The viewing angle 
spanned by for the 18 camera positions used in computing Table 3 is 85°. Notice that even with using 18 images*— 9 
pairs of differences — the algorithm in Sec. IV.C is considerably more accurate. The reason is that the algorithm minim- 
izing (32) uses only the differences in pairs of images taken with camera optical axes thff are almost parallel. Hence, it 
is small baseline stereo and suffers many of the disadvantages of the use of optical flow. If the images are noisy, the 
relative accuracy of this algorithm would probably degrade considerably. It is interesting to note that the size of the 
angle between the optical axes of the flrst and last images is not very important here. Rather, improved accuracy comes 
from using many pairs of images in order to average out the effects of noisy perturbations. 


On the other hand, small angle stereo permits computational advantages which we briefly touch upon. If the cam- 
era does not move much in going from the (2n-l)th to the (2n)th position, u 2a (u,a), where ueD 2n -i, is close to u since 
CRF(2n) is close to CRF(2n-l). Hence, we can use the Taylor series expansion: 


l2n(“2 n (U,a)) = fcaOO + 


dl2n(u) 

du 2n (u,a) 

t)u 

0b 


(33) 


Thus, 


U2 B (U 2 a (U,a)) - I 2 a -i(u )] 2 a 


' r 

<H2n(U> 

3u 2n (u,a) 


* [l2n(u) “ l2n-l(u) j + 

0 U 

t ab 

Ab 

< 


( 34 ) 


where Ab is an incremental vector specifying the incremental rotation and the incremental origin translation for CRF(2n) 
in term of CRF(2n-l). The desirability of the approximation is that in minimizing (32) with respect to a it is no longer 

oi 2a (v) j 

necessary to compute the u^u.a) and then the arrays I 2 <Va) and — - — i for all ue£) 2a _i. Rather we can just 

dv 

use the arrays I 2a (u) and — - — , uef) 2a _t, directly. This makes for a considerable reduction in required computation. 

9u 2n (u,a) 

Furthermore, note that when computing the gradient of (34) with respect to a, only the term — ^ — is a function of a, 
and this function is very simple as seen in Gq. 6a. 

The final remark of interest is that few the planar surface described in the appendix, the use of Eqs. 6, 34, and A2 
(from the appendix) in Eq. 32 permits a simple explicit solution for a^j. the value of a that minimizes (32). 


VI Conclusion 

In this paper, for the first time the joint likelihood of two or more images as a function of the a priori unknown 3- 
D surface to be estimated is derived. This permits the full range of Bayesian analysis, estimation, and recognition tech- 
niques to be applied to the 3-D surface inferencing problem. In particular, in this paper we develop a recursive 
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algorithm for the maximum likelihood estimation of a parameterized surface based on a sequence of images taken, 
perhaps, by a moving camera. This recursive estimator should be significantly more accurate than the use of the 
extended Kalman filter, since the latter uses a linearization about the N* stage estimate to compute the N+l* stage esti- 
mate whereas we use the complete information in the N+l* image. 


APPENDIX: The Plane 

We derive the expression for the vector dz/da for a plane. Note that there are a number of different sets of param- 
eters that can be used for representing a plane (or a cylinder, or a more general surface). We use the canonical parame- 
terization in this section. We use the equation 

0 = g(x,y,z) m fj,x + Pay + P 3 z - ( A1 > 

subject to the constraint 


0 = f(x,y,z) * p? + p| + P| - 1 


(Ala) 


Note, |d| is the distance from the plane to the origin in this representation. It is assumed that the plane is in general posi- 
tion, because if, e.g., J3 3 = 0, then the plane normal is orthogonal to the first camera’s optical axis, and the plane surface 
is not seen by the first camera since the camera then sees only the plane's edge. Eq. (Ala) can be used to solve for pj 

in terms of Pi and p 2 . Hence we can take a to be a T = (Pi.Pfcd). Now Using (Ala), we get 


ar/api _ -sp, ^ afc _ jwafc _ P* Hence & _ 

— -^,-TrT - ' na" " ~ aT • Similarly, n • Hence, _ 


ap!_ 

api af/ap 3 2p 3 pj 


ap 2 


is. 

da 


JSl. 

apt 


s* X + Z 


af/ap 3 

api 

aps 


Pi* 


= X - z 


-is. = y + z 

apt y ap2 

ifi. = -i 

dd 


Pi 

— 3 

Pi 

Pay - Pi* 
Pi 


dz 

Pi* - Pi* 
Pi 


Pi 


Thus, 


dz 

da 


pjz - p 3 x p 2 z 


PI 


PI 


Pay J_ 
’Pi 


(A2) 
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(a) with 1° between optical axes (b) with 5° between optical axes (c) with to 0 between optical axes 

Fig. 4 Error function based on two images 
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R 

initial a 

T/o.o 

50.0 

-2080.0 
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O 

26.8 

20.3 

-2001.8 

128 

4 

16.0 

13.3 

-2001.1 

128 

6 

10.1 
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-2001.4 

128 

3 

8.6 

3.8 

-2001.2 
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10 

10.3 

8.9 

-1999.9 

128 

12 

5.3 

4.9 

-2002.1 

128 

14 
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16 

5.0 

-1.4 
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18 

-0.8 
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0.0 
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-2000.0 
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